NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TeLLMe v2: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAs

Qiao, Ye; Chen, Zhiheng; Zhang, Yifan; Wang, Yian; Huang, Sitao (October 2025, arXiv)

With the emergence of wearable devices and other embedded systems, deploying large language models (LLMs) on edge platforms has become an urgent need. However, this is challenging because of their high computational and memory demands. Although recent low-bit quantization methods (e.g., BitNet, DeepSeek) compress weights to as low as 1.58~bits with minimal accuracy loss, edge deployment is still constrained by limited on-chip resources, power budgets, and the often-neglected long latency of the prefill stage. We present TeLLMe, the first table-lookup-based ternary LLM accelerator for low-power edge FPGAs that fully supports both prefill and autoregressive decoding using 1.58-bit weights and 8-bit activations. TeLLMe incorporates several novel techniques, including (1) a table-lookup-based ternary matrix multiplication (TLMM) engine utilizing grouped activations and online precomputation for low resource utilization and high throughput; (2) a fine-grained analytic URAM-based weight buffer management scheme for efficient loading and compute engine access; (3) a streaming dataflow architecture that fuses floating-point element-wise operations with linear computations to hide latency; (4) a reversed-reordered prefill stage attention with fused attention operations for high memory efficiency; and (5) a resource-efficient specialized decoding stage attention. Under a 5~W power budget, TeLLMe delivers up to 25~tokens/s decoding throughput and 0.45--0.96~s time-to-first-token (TTFT) for 64--128 token prompts, marking a significant energy-efficiency advancement in LLM inference on edge FPGAs.
more » « less
Free, publicly-accessible full text available October 21, 2026
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference

Qiao, Ye; Chen, Zhiheng; Wang, Yian; Zhang, Yifan; Deng, Yunzhe; Huang, Sitao (October 2025, IEEE/ACM 2025 International Conference on Computer-Aided Design)

Transformer-based models have demonstrated superior performance in various fields, including natural language processing and computer vision. However, their enormous model size and high demands in computation, memory, and communication limit their deployment to edge platforms for local, secure inference. Binary transformers offer a compact, low-complexity solution for edge deployment with reduced bandwidth needs and acceptable accuracy. However, existing binary transformers perform inefficiently on current hardware due to the lack of binary specific optimizations. To address this, we introduce COBRA, an algorithm-architecture co-optimized binary Transformer accelerator for edge computing. COBRA features a real 1-bit binary multiplication unit, enabling matrix operations with -1, 0, and +1 values, surpassing ternary methods. With further hardware-friendly optimizations in the attention block, COBRA achieves up to 3,894.7 GOPS throughput and 448.7 GOPS/Watt energy efficiency on edge FPGAs, delivering a 311× energy efficiency improvement over GPUs and a 3.5× throughput improvement over the state-of-the-art binary accelerator, with only negligible inference accuracy degradation.
more » « less
Free, publicly-accessible full text available October 27, 2026
RoboGen: towards unleashing infinite data for automated robot learning via generative simulation

Wang, Yufei; Xian, Zhou; Chen, Feng; Wang, Tsun-Hsuan; Wang, Yian; Fragkiadaki, Katerina; Erickson, Zackory; Held, David; Gan, Chuang (June 2024, arxiv.org)

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates simulation environments by populating pertinent assets with proper spatial configurations. Afterwards, the agent decomposes the proposed task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.
more » « less
Full Text Available
iFiG: Individually Fair Multi-view Graph Clustering

https://doi.org/10.1109/BigData55660.2022.10020548

Wang, Yian; Kang, Jian; Xia, Yinglong; Luo, Jiebo; Tong, Hanghang (December 2022, 2022 IEEE International Conference on Big Data (Big Data))

In a connected world, fair graph learning is becoming increasingly important because of the growing concerns about bias. Yet, the vast majority of existing works assume that the input graph comes from a single view while ignoring the multi-view essence of graphs. Generally speaking, the bias in graph mining is often rooted in the input graph and is further introduced or even amplified by the graph mining model. It thus poses critical research questions regarding the intrinsic relationships of fairness on different views and the possibility of mitigating bias on multiple views simultaneously. To answer these questions, in this paper, we explore individual fairness in multi-view graph mining. We first demonstrate the necessity of fair multi-view graph learning. Building upon the optimization perspective of fair single-view graph mining, we then formulate our problem as a linear weighted optimization problem. In order to figure out the weight of each view, we resort to the minimax Pareto fairness, which is closely related to the Rawlsian difference principle, and propose an effective solver named iFiG that minimizes the utility loss while promoting individual fairness for each view with two different instantiations. The extensive experiments that we conduct in the application of multi-view spectral clustering and INFORM post-processing demonstrate the efficacy of our proposed method in individual bias mitigation.
more » « less
Full Text Available

Search for: All records